55 research outputs found
Efficient Neuromorphic Computing Enabled by Spin-Transfer Torque: Devices, Circuits and Systems
Present day computers expend orders of magnitude more computational resources to perform various cognitive and perception related tasks that humans routinely perform everyday. This has recently resulted in a seismic shift in the field of computation where research efforts are being directed to develop a neurocomputer that attempts to mimic the human brain by nanoelectronic components and thereby harness its efficiency in recognition problems. Bridging the gap between neuroscience and nanoelectronics, this thesis demonstrates the encoding of biological neural and synaptic functionalities in the underlying physics of electron spin. Description of various spin-transfer torque mechanisms that can be potentially utilized for realizing neuro-mimetic device structures is provided. A cross-layer perspective extending from the device to the circuit and system level is presented to envision the design of an All-Spin neuromorphic processor enabled with on-chip learning functionalities. Device-circuit-algorithm co-simulation framework calibrated to experimental results suggest that such All-Spin neuromorphic systems can potentially achieve almost two orders of magnitude energy improvement in comparison to state-of-the-art CMOS implementations
Sequence Learning using Equilibrium Propagation
Equilibrium Propagation (EP) is a powerful and more bio-plausible alternative
to conventional learning frameworks such as backpropagation. The effectiveness
of EP stems from the fact that it relies only on local computations and
requires solely one kind of computational unit during both of its training
phases, thereby enabling greater applicability in domains such as bio-inspired
neuromorphic computing. The dynamics of the model in EP is governed by an
energy function and the internal states of the model consequently converge to a
steady state following the state transition rules defined by the same. However,
by definition, EP requires the input to the model (a convergent RNN) to be
static in both the phases of training. Thus it is not possible to design a
model for sequence classification using EP with an LSTM or GRU like
architecture. In this paper, we leverage recent developments in modern hopfield
networks to further understand energy based models and develop solutions for
complex sequence classification tasks using EP while satisfying its convergence
criteria and maintaining its theoretical similarities with recurrent
backpropagation. We explore the possibility of integrating modern hopfield
networks as an attention mechanism with convergent RNN models used in EP,
thereby extending its applicability for the first time on two different
sequence classification tasks in natural language processing viz. sentiment
analysis (IMDB dataset) and natural language inference (SNLI dataset)
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Large language Models (LLMs), though growing exceedingly powerful, comprises
of orders of magnitude less neurons and synapses than the human brain. However,
it requires significantly more power/energy to operate. In this work, we
propose a novel bio-inspired spiking language model (LM) which aims to reduce
the computational cost of conventional LMs by drawing motivation from the
synaptic information flow in the brain. In this paper, we demonstrate a
framework that leverages the average spiking rate of neurons at equilibrium to
train a neuromorphic spiking LM using implicit differentiation technique,
thereby overcoming the non-differentiability problem of spiking neural network
(SNN) based algorithms without using any type of surrogate gradient. The
steady-state convergence of the spiking neurons also allows us to design a
spiking attention mechanism, which is critical in developing a scalable spiking
LM. Moreover, the convergence of average spiking rate of neurons at equilibrium
is utilized to develop a novel ANN-SNN knowledge distillation based technique
wherein we use a pre-trained BERT model as "teacher" to train our "student"
spiking architecture. While the primary architecture proposed in this paper is
motivated by BERT, the technique can be potentially extended to different kinds
of LLMs. Our work is the first one to demonstrate the performance of an
operational spiking LM architecture on multiple different tasks in the GLUE
benchmark.Comment: Under Revie
- …